33 research outputs found
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training
Non-task-oriented dialog models suffer from poor quality and non-diverse
responses. To overcome limited conversational data, we apply Simulated Multiple
Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to
simulate multiple responses per training prompt. We find SMRT improves over a
strong Transformer baseline as measured by human and automatic quality scores
and lexical diversity. We also find SMRT is comparable to pretraining in human
evaluation quality, and outperforms pretraining on automatic quality and
lexical diversity, without requiring related-domain dialog data.Comment: EMNLP 2020 Camera Read
Degendering Resumes for Fair Algorithmic Resume Screening
We investigate whether it is feasible to remove gendered information from
resumes to mitigate potential bias in algorithmic resume screening. Using a
corpus of 709k resumes from IT firms, we first train a series of models to
classify the self-reported gender of the applicant, thereby measuring the
extent and nature of gendered information encoded in resumes. We then conduct a
series of gender obfuscation experiments, where we iteratively remove gendered
information from resumes. Finally, we train a resume screening algorithm and
investigate the trade-off between gender obfuscation and screening algorithm
performance. Results show: (1) There is a significant amount of gendered
information in resumes. (2) Lexicon-based gender obfuscation method (i.e.
removing tokens that are predictive of gender) can reduce the amount of
gendered information to a large extent. However, after a certain point, the
performance of the resume screening algorithm starts suffering. (3)
General-purpose gender debiasing methods for NLP models such as removing gender
subspace from embeddings are not effective in obfuscating gender.Comment: Non
Modeling Empathy and Distress in Reaction to News Stories
Computational detection and understanding of empathy is an important factor
in advancing human-computer interaction. Yet to date, text-based empathy
prediction has the following major limitations: It underestimates the
psychological complexity of the phenomenon, adheres to a weak notion of ground
truth where empathic states are ascribed by third parties, and lacks a shared
corpus. In contrast, this contribution presents the first publicly available
gold standard for empathy prediction. It is constructed using a novel
annotation methodology which reliably captures empathy assessments by the
writer of a statement using multi-item scales. This is also the first
computational work distinguishing between multiple forms of empathy, empathic
concern, and personal distress, as recognized throughout psychology. Finally,
we present experimental results for three different predictive models, of which
a CNN performs the best.Comment: To appear at EMNLP 201